AITopics | high quality dataset

Collaborating Authors

high quality dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Quality in Imitation Learning

Neural Information Processing SystemsDec-27-2025, 07:29:16 GMT

In supervised learning, the question of data quality and curation has been sidelined in recent years in favor of increasingly more powerful and expressive models that can ingest internet-scale data. However, in offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity. This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations. Policies learned through IL suffer from state distribution shift at test time due to compounding errors in action prediction, which leads to unseen states that the policy cannot recover from.Instead of designing new algorithms to address distribution shift, an alternative perspective is to develop new ways of assessing and curating datasets. There is growing evidence that the same IL algorithms can have substantially different performance across different datasets. This calls for a formalism for defining metrics of data quality that can further be leveraged for data curation.In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time. We propose two fundamental properties that are necessary for a high quality datasets: i) action divergence: the mismatch between the expert and learned policy at certain states; and ii) transition diversity: the noise present in the system for a given state and action. We investigate the combined effect of these two key properties in imitation learning theoretically, and we empirically analyze models trained on a variety of different data sources. We show that state diversity is not always beneficial, and we demonstrate how action divergence and transition diversity interact in practice.

data quality, high quality dataset, name change, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Quality in Imitation Learning

Neural Information Processing SystemsJan-20-2025, 03:35:48 GMT

distribution shift, high quality dataset, imitation learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Top Big AI Trends and Challenges Impacting Media, Advertising & Entertainment Industry

#artificialintelligenceMay-1-2021, 11:25:21 GMT

I recently interviewed some of the top data science leaders from Comcast/Freewheel, Condé Nast, ViacomCBS, Audoir, USA Today Network, and Samba TV on the biggest trends, challenges, and opportunities they see for ML & AI in media, advertising, & entertainment -- and what the future may hold. What are some of the biggest trends you'll see being adopted by the entertainment and media industries? Christopher Whitely, Senior Director of Applied Analytics at Comcast/FreeWheel, shares "There are a few areas that we'll see adopted by M&E industries in the coming months and years, including more contextual advertising, where advertising creative assets are matched to appropriate program content algorithmically. Federated learning is also a new trend, which refers to modeling using machine learning without sharing data sets. Privacy is important, so I expect we'll see continued use of aggregated customer segments and clean rooms for marketing and analytics. Also, lookalike models will help advertisers reach potential customers and optimize campaigns for the greatest effect."

advertising & entertainment industry, christopher whitely, trend and challenge impacting media, (8 more...)

#artificialintelligence

Country: North America > United States (0.51)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions

#artificialintelligenceAug-28-2019, 19:04:00 GMT

When comes to machine learning, data is certainly the new oil. The processes for managing the lifecycle of datasets are some of the most challenging elements of large scale machine learning solutions. Data ingestion, indexing, search, annotation, discovery are some of the aspects required to maintain high quality datasets. The complexity of these challenges increase linearly with the size and number of the target datasets. While it is relatively easy to manage training datasets for a single machine learning model, scaling that process across thousands of dataset and hundreds of models can become nothing short of a nightmare. Some of the companies at the forefront of machine learning innovation such as LinkedIn, Uber, Netflix, Airbnb or Lyft have certainly experienced the magnitude of this challenge and they have built specific solutions to address it.

artificial intelligence, linkedin, machine learning, (11 more...)

#artificialintelligence

Industry:

Education (1.00)
Information Technology > Services (0.97)
Transportation > Passenger (0.62)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback